视频框架插值是一项艰巨的任务,这是由于不断变化的现实场景。先前的方法通常计算双向光流,然后在线性运动假设下预测中间光流,从而导致各向同性中间流量产生。随访研究通过估计的高阶运动信息和额外的帧获得各向异性调整。基于运动假设,它们的方法很难在真实场景中对复杂的运动进行建模。在本文中,我们提出了一种端到端训练方法A^2OF,用于视频框架插值,并通过事件驱动的各向异性调整光学流量调节。具体而言,我们使用事件为中间光流生成光流分布掩码,这可以对两个帧之间的复杂运动进行建模。我们提出的方法在视频框架插值中优于先前的方法,将基于事件的视频插值带到了更高的阶段。
translated by 谷歌翻译
具有许多预训练模型(PTM)的模型中心已经是深度学习的基石。尽管以高成本建造,但它们仍然保持\ emph {探索}:从业人员通常会通过普及从提供的模型中心中选择一个PTM,然后对PTM进行微调以解决目标任务。这种na \“我的但共同的实践构成了两个障碍,以充分利用预训练的模型中心:(1)通过受欢迎程度选择的PTM选择没有最佳保证;(2)仅使用一个PTM,而其余的PTM则被忽略。理想情况下。理想情况下。 ,为了最大程度地利用预训练的模型枢纽,需要尝试所有PTM的所有组合和广泛的微调每个PTM组合,这会产生指数组合和不可偿还的计算预算。在本文中,我们提出了一种新的范围排名和调整预训练的模型:(1)我们的会议论文〜\ citep {you_logme:_2021}提出的logMe,以估算预先训练模型提取的标签证据的最大值,该标签证据可以在模型中排名所有PTMS用于各种类型的PTM和任务的枢纽\ Emph {微调之前}。(2)如果我们不偏爱模型的体系结构,则可以对排名最佳的PTM进行微调和部署,或者可以通过TOPE调整目标PTM -k通过t排名PTM他提出了b-tuning算法。排名部分基于会议论文,我们在本文中完成了其理论分析,包括启发式证据最大化程序的收敛证明和特征维度的影响。调整零件引入了一种用于调整多个PTM的新型贝叶斯调整(B-Tuning)方法,该方法超过了专门的方法,该方法旨在调整均匀的PTMS,并为调整异质PTMS设置了一种新的技术。利用PTM枢纽的新范式对于整个机器学习社区的大量受众来说可能会很有趣。
translated by 谷歌翻译
在本文中,我们介绍了Tianshou,这是一个高度模块化的Python库,用于深钢筋学习(DRL),它使用Pytorch作为后端。天舒(Tianshou)打算通过提供DRL算法的灵活和可靠的基础架构来对研究进行研究。它通过统一界面通过20多种经典算法来支持在线和离线培训。为了促进相关的研究并证明天舒的可靠性,我们发布了田肖(Tianshou)的Mujoco环境基准,涵盖了八种具有最先进性能的经典算法。我们通过https://github.com/thu-ml/tianshou/开放源。
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
Learning to predict masked tokens in a sequence has been shown to be a powerful pretraining objective for large-scale language models. After training, such masked language models can provide distributions of tokens conditioned on bidirectional context. In this short draft, we show that such bidirectional conditionals often demonstrate considerable inconsistencies, i.e., they can not be derived from a coherent joint distribution when considered together. We empirically quantify such inconsistencies in the simple scenario of bigrams for two common styles of masked language models: T5-style and BERT-style. For example, we show that T5 models often confuse its own preference regarding two similar bigrams. Such inconsistencies may represent a theoretical pitfall for the research work on sampling sequences based on the bidirectional conditionals learned by BERT-style MLMs. This phenomenon also means that T5-style MLMs capable of infilling will generate discrepant results depending on how much masking is given, which may represent a particular trust issue.
translated by 谷歌翻译
This paper presents a practical global optimization algorithm for the K-center clustering problem, which aims to select K samples as the cluster centers to minimize the maximum within-cluster distance. This algorithm is based on a reduced-space branch and bound scheme and guarantees convergence to the global optimum in a finite number of steps by only branching on the regions of centers. To improve efficiency, we have designed a two-stage decomposable lower bound, the solution of which can be derived in a closed form. In addition, we also propose several acceleration techniques to narrow down the region of centers, including bounds tightening, sample reduction, and parallelization. Extensive studies on synthetic and real-world datasets have demonstrated that our algorithm can solve the K-center problems to global optimal within 4 hours for ten million samples in the serial mode and one billion samples in the parallel mode. Moreover, compared with the state-of-the-art heuristic methods, the global optimum obtained by our algorithm can averagely reduce the objective function by 25.8% on all the synthetic and real-world datasets.
translated by 谷歌翻译
Through a study of multi-gas mixture datasets, we show that in multi-component spectral analysis, the number of functional or non-functional principal components required to retain the essential information is the same as the number of independent constituents in the mixture set. Due to the mutual in-dependency among different gas molecules, near one-to-one projection from the principal component to the mixture constituent can be established, leading to a significant simplification of spectral quantification. Further, with the knowledge of the molar extinction coefficients of each constituent, a complete principal component set can be extracted from the coefficients directly, and few to none training samples are required for the learning model. Compared to other approaches, the proposed methods provide fast and accurate spectral quantification solutions with a small memory size needed.
translated by 谷歌翻译
Neural operators, which emerge as implicit solution operators of hidden governing equations, have recently become popular tools for learning responses of complex real-world physical systems. Nevertheless, the majority of neural operator applications has thus far been data-driven, which neglects the intrinsic preservation of fundamental physical laws in data. In this paper, we introduce a novel integral neural operator architecture, to learn physical models with fundamental conservation laws automatically guaranteed. In particular, by replacing the frame-dependent position information with its invariant counterpart in the kernel space, the proposed neural operator is by design translation- and rotation-invariant, and consequently abides by the conservation laws of linear and angular momentums. As applications, we demonstrate the expressivity and efficacy of our model in learning complex material behaviors from both synthetic and experimental datasets, and show that, by automatically satisfying these essential physical laws, our learned neural operator is not only generalizable in handling translated and rotated datasets, but also achieves state-of-the-art accuracy and efficiency as compared to baseline neural operator models.
translated by 谷歌翻译
Diagram object detection is the key basis of practical applications such as textbook question answering. Because the diagram mainly consists of simple lines and color blocks, its visual features are sparser than those of natural images. In addition, diagrams usually express diverse knowledge, in which there are many low-frequency object categories in diagrams. These lead to the fact that traditional data-driven detection model is not suitable for diagrams. In this work, we propose a gestalt-perception transformer model for diagram object detection, which is based on an encoder-decoder architecture. Gestalt perception contains a series of laws to explain human perception, that the human visual system tends to perceive patches in an image that are similar, close or connected without abrupt directional changes as a perceptual whole object. Inspired by these thoughts, we build a gestalt-perception graph in transformer encoder, which is composed of diagram patches as nodes and the relationships between patches as edges. This graph aims to group these patches into objects via laws of similarity, proximity, and smoothness implied in these edges, so that the meaningful objects can be effectively detected. The experimental results demonstrate that the proposed GPTR achieves the best results in the diagram object detection task. Our model also obtains comparable results over the competitors in natural image object detection.
translated by 谷歌翻译
Despite excellent performance in image generation, Generative Adversarial Networks (GANs) are notorious for its requirements of enormous storage and intensive computation. As an awesome ''performance maker'', knowledge distillation is demonstrated to be particularly efficacious in exploring low-priced GANs. In this paper, we investigate the irreplaceability of teacher discriminator and present an inventive discriminator-cooperated distillation, abbreviated as DCD, towards refining better feature maps from the generator. In contrast to conventional pixel-to-pixel match methods in feature map distillation, our DCD utilizes teacher discriminator as a transformation to drive intermediate results of the student generator to be perceptually close to corresponding outputs of the teacher generator. Furthermore, in order to mitigate mode collapse in GAN compression, we construct a collaborative adversarial training paradigm where the teacher discriminator is from scratch established to co-train with student generator in company with our DCD. Our DCD shows superior results compared with existing GAN compression methods. For instance, after reducing over 40x MACs and 80x parameters of CycleGAN, we well decrease FID metric from 61.53 to 48.24 while the current SoTA method merely has 51.92. This work's source code has been made accessible at https://github.com/poopit/DCD-official.
translated by 谷歌翻译